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ABSTRACT 


This  paper  is  concerned  with  the  problem  of  simultaneous  testing  for  n-component 
decisions.  Under  the  specific  statistical  model,  the  n  components  share  certain  similar¬ 
ity.  Thus,  empirical  Bayes  approach  is  employed.  We  give  a  general  formulation  of  this 
empirical  Bayes  decision  problem  with  a  specialization  to  the  problem  of  selecting  good 
Poisson  populations.  Three  empirical  Bayes  methods  are  used  to  incorporate  information 
from  different  sources  for  making  a  decision  for  each  of  the  n  components.  They  are:  non- 
parametric  empirical  Bayes,  parametric  empirical  Bayes  and  hierarchical  empirical  Bayes. 
For  each  of  them,  a  corresponding  empirical  Bayes  decision  rule  is  proposed.  The  asymp¬ 
totic  optimality  properties  and  the  convergence  rates  of  the  three  empirical  Bayes  rules 
are  investigated.  It  is  shown  that  for  each  of  the  three  empirical  Bayes  rules,  the  rate  of 
convergence  is  at  least  of  order  0(exp(— cn  +  Inn))  for  some  positive  constant  c,  where  the 
value  of  c  varies  depending  on  the  empirical  Bayes  rule  used,  f  ^ 
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1.  Introduction 


We  consider  a  decision  problem  involving  n  components  as  follows.  Let  n 

denote  n  independent  populations  of  the  n  components,  respectively,  where  population 
7T{  is  characterized  by  a  parameter  i  =  1 For  the  given  decision  problem, 
let  a,  denote  an  action  for  the  t-th  component  and  let  L(0,,  a,)  be  the  corresponding 

n 

loss  function.  Thus,  L*(0,a)  =  L(0i,  a,)  is  the  total  loss  where  0  =  (0i,...,0n)  and 

»=i 

a  =  (ax,...,an).  Suppose  that  for  each  i  =  l,...,n,  the  parameter  0i  is  a  realization 
of  a  random  variable  0,,  which  has  a  prior  distribution  Gi  over  the  parameter  space  fi,. 
Let  X,  denote  a  random  observation  arising  from  population  7r»  with  probability  density 
function  /,(x|0,).  Let  <f,  be  a  decision  rule  defined  on  the  sample  space  Xi  of  X,  for  the 
t-th  component  problem.  Then,  under  some  regularity  conditions,  the  total  Bayes  risk  of 
the  decision  rule  d  =  (d i, . . . ,  dn)  is: 


’’(G,d)  =  ]Tr,(Gt,d,)  (1.1) 

i=i 


where  G  =  Gi  x  ...  x  Gn,  and 

ri(Gitdi)=  \  [  L(0,  di(x))fi(x\0)dxdGi(0) 
JO,  JXi 


=  f  17  L(0,  di(x))dGi(0\x) 

JXi  UOi 


fi{x)dx 


(1.2) 


where  Gi(0 |z)  is  the  posterior  distribution  of  0*  give,  a.;  =  x  and  /,■  (x)  is  the  marginal 
probability  density  function  of  X,.  Thus,  for  the  t-th  co.  n  onent  problem,  the  Bayes  rule 
is  the  one  which  minimizes  /n.  L(0,  di(x))dGi(d\x)  among  the  class  of  decision  rules  for 
the  t-th  component  decision  problem.  The  overall  minimum  Bayes  risk  is 


r{Q,dB)  =  2><(G„d,B) 

»= l 


where  dB  =  (dis,  •  • .  ,dnB)  and  d{B  is  a  Bayes  rule  for  the  t-th  component  decision 
problem,  t  =  l,...,n. 


When  the  prior  distributions  Gi,  i  =  1  ,...,n,  are  unknown,  the  Bayes  rule  cannot 
be  applied.  However,  in  many  situations,  the  n-component  decision  problems  may  share 
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the  same  or  similar  properties.  When  this  occurs,  one  may  incorporate  all  the  information 
obtained  from  different  sources  and  make  an  appropriate  decision  for  each  of  the  n  com¬ 
ponents.  This  i  ea  is  analogous  to  the  empirical  Bayes  approach  of  Robbins  (1956,1964). 
Thus,  in  the  following,  we  let  d,  denote  a  decision  rule  for  the  *— th  component  problem, 
where  d,  is  now  defined  on  the  sample  space  X  =  Xx  x  . . .  x  Xn  of  X  =  (ATX, . . . ,  Xn)]  also, 
denote  d;(xx,...,xn)  by  dt(xt|x(t'))  where  x(t)  =  (xx, . . .  ,xt_x,  xl+x, . . . ,  xn).  Then, 


ri{Gi,di)  =  Ei  \  [  L{6,  diixlXWfiixWdxdGi 
Un,  J  r. 


w 


where  the  expectation  Et  is  taken  with  respect  to  the  marginal  distribution  of  X(i)  = 
(Xi, . . . ,  X,_x,X,+x, . . . ,  Xn).  Since  r,(G,,dtB)  is  the  minimum  Bayes  risk  for  the  i-th 
component  problem,  r,(G,-, d,)  —  r,(Gi,dtB)  >  0  for  each  i  =  1  ,...,n,  and  therefore, 
r(<?,d)  -  r(G,dB )  =  £[r,(Gt,d,)  -  r,(Gt  ,  diB)]  >0. 

»=i 

In  certain  compound  decision  problems,  the  average  ^[r(G,d)  —  r(G,dB)]  has  been 
used  as  a  measure  of  the  performance  of  the  decision  rule  d.  The  asymptotic  behavior 
of  \[r(G,d)  -  r(G,ds)l  has  been  investigated  extensively;  for  example,  see  Vardeman 
(1978,1980),  Gilliland  and  Hannan  (1986)  and  Gilliland,  Hannan  and  Huang  (1976),  among 
others.  Many  of  the  results  indicate  that  ^[r(G,d)  —  r(G,dB)]  tends  to  0  as  n  tends 
to  infinity.  However,  so  far  as  we  know,  the  asymptotic  behavior  of  the  regret  value 
r(G,d)-r(G,dB)  has  not  been  investigated  since  it  seems  that  r(G,d)-r(G,dB)  might  tend 
to  infinity  when  n  tends  to  infinity.  Very  surprisingly,  we  find  that  in  certain  compound 
empirical  Bayes  decision  problems.  r(G,d)  —  r(G,dB)  — ►  0  as  n  — *  oo.  This  result  indicates 
the  advantage  of  incorporating  all  the  information  from  different  sources  for  making  a 
decision  for  each  of  the  n  component  problems. 


In  this  paper,  we  investigate  the  asymptotic  optimality  properties  of  certain  empirical 
Bayes  procedures  for  simultaneous  testing  problems.  The  regret  value  r(G,d)  —  r(G,dB) 
is  used  as  a  measure  of  the  performance  of  the  decision  rule  d.  The  general  framework 
of  the  empirical  Bayes  decision  problems  under  study  is  formulated  in  Section  2.  Then, 
examples  are  given  and  used  to  illustrate  how  to  incorporate  information  from  different 
sources.  For  each  of  them,  the  corresponding  convergence  rate  is  investigated. 
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2.  Formulation  of  the  Empirical  Baves  Decision  Problem 


Let  7Ti,...,7rn  denote  n  independent  populations.  For  each  i  =  population 

7?i  is  characterized  by  a  parameter  0,.  Let  0o  denote  a  standard  or  a  control.  The  problem 
of  selecting  populations  with  respect  to  a  control  has  been  extensively  studied  in  the  liter¬ 
ature.  Dunnett  (1955)  and  Gupta  and  Sobel  (1958)  have  considered  problems  of  selecting 
a  subset  containing  all  populations  better  than  a  control  using  some  natural  procedures. 
Lehmann  (1961)  and  Spjptvoll  (1972)  have  treated  the  problem  using  methods  from  the 
theory  of  testing  hypotheses.  Randles  and  Hollander  (1971),  Gupta  and  Kim  (1980),  Mi- 
escke  (1981)  and  Gupta  and  Miescke  (1985)  have  derived  optimal  procedures  via  minimax 
or  gamma-minimax  approaches.  The  reader  is  referred  to  Gupta  and  Panchapakesan 
(1979,1985)  for  an  overview  of  this  research  area.  In  this  paper,  we  study  the  problem  of 
selecting  good  populations  from  among  n  populations  using  the  empirical  Bayes  approach. 

For  each  *  =  1, . . .  ,n,  let  X,-  denote  a  random  observation  arising  from  population  7 r,- 
with  probability  density  function  /(x|0j).  The  observation  X,  may  be  thought  of  as  the 
value  of  a  sufficient  statistic  for  the  parameter  0,  based  on  several  iid  observations  taken 
from  7T,-.  Let  0O  be  a  known  constant.  This  0O  can  be  used  as  a  standard  level  to  evaluate 
each  of  the  n  populations.  Population  ni  is  said  to  be  good  if  0,  >  0O,  and  bad  otherwise. 
Our  goal  is  to  select  all  the  good  populations  and  exclude  all  the  bad  populations. 

Let  n  =  {0  =  (0i, . . .  ,0n)|/(z|0i)  is  well-defined,  *  =  1,. . .  ,n}  be  the  parameter  space 
and  let  A  =  {q  =  (ai,. ..  ,a„)|a,-  =  0, 1,*  =  1,..  .,n)  be  the  action  space.  When  action  a  is 
taken,  it  means  that  population  7r,-  is  selected  as  a  good  population  if  a,-  =  1,  and  excluded 
as  a  bad  one  if  a{  =  0.  For  each  0  6  fl  and  a  e  A,  the  loss  function  L(9,a)  is  defined  to  be: 

n  n 

L{9,q)  =  ]Ta,(0o  -  9i)I{0 o  -  0.)  +  £>  -  a,’)(0,‘  -  0o)/(0,  -  0O)  (2.1) 

»'=  l  »=i 

where  /(z)  =  1(0)  if  x  >  (<)0. 

It  is  assumed  that  for  each  t,  the  parameter  0j  is  a  realization  of  a  random  variable 
©i.  It  is  also  assumed  that  the  n  random  variables  0,-,i  =  l,...,n,  are  independently 
distributed  with  a  common  but  unknown  prior  distribution  G.  Thus,  0  =  (0i,...,0n) 
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has  a  joint  prior  distribution  G(0)  =  J]  <2(0,)  over  the  parameter  space  ft.  Under  the 

i=  1 

preceding  assumptions,  Xi, . . . ,  Xn  are  iid  with  the  marginal  probability  density  function 

/(*)  =  /  /(*|0)dG(0). 

For  each  i  =  1, . . .  ,n,  let  Xi  be  the  sample  space  of  Xi,  and  let  X  =  X\  x  . . .  x  Xn. 

Let  X  =  (Xi,...,Xn)  and  let  x  =  (ii,...,xn)  be  the  observed  value  of  X.  A  selection 

rule  d  =  (di, . . .  ,dn)  is  defined  to  be  a  mapping  from  X  into  [0,  ljfc  such  that  <f,(x)  is  the 

probability  of  selecting  7r,  as  a  good  population  given  X  —  x.  Let  D  be  the  class  of  all 

selection  rules,  and  let  r(G,  d)  denote  the  Bayes  risk  associated  with  each  d  £  D.  Then, 

r(G)  =  inf  r(G,d)  is  the  minimum  Bayes  risk. 
d^D 

The  Bayes  risk  associated  with  any  rule  d  £  D  can  be  rewritten  as 

n 

r(G,d)  =  ^rt(G,dO  (2.2) 

i=i 

where 

r  * 

ri{G,di)  =  [0O  -  w(x,-)]4(x)  ]^J  f{xj)dx  +  C  (2.3) 

Jx  y=i 

where  <Pi[ii)  =  £[0,jX,-  =  x,]  =  /  0/(xt|0)<iG(0)//(x,),  the  posterior  mean  of  0,  given 
X,  =  Xi,  and  C  =  fx.  J™(0  -  eo)f{x\9)dG{0)dx. 

Since  the  value  C  is  independent  of  the  selection  rule  d,  from  (2.3),  a  Bayes  rule,  say 
4b  ~  ( die ,  •  •  • ,dnB )  is  clearly  given  by 


if  <Pi{xi)  >  0o, 
otherwise, 


(2.4) 


n 

and  the  minimum  Bayes  risk  is:  r(G)  =  T\(G,diB ). 

»=i 

Since  the  prior  distribution  G  is  unknown,  it  is  not  possible  to  apply  the  Bayes  rule 
4b  for  the  selection  problem  at  hand.  However,  the  selection  problem  under  study  can  be 
viewed  as  that  in  which  we  are  dealing  with  a  Bayes  decision  problem  having  n  compo¬ 
nents  with  a  common  unknown  prior  distribution.  Thus,  the  empirical  Bayes  approach  of 
Robbins  (1956,1964)  cam  be  employed  here.  We  use  all  the  observations  obtained  from  the 
n  populations  to  form  a  decision  for  each  of  the  n-component  problems. 
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Let  <pin(^»|?(0)  be  an  estimator  of  pt-(zt-)  based  on  (zi, . . .  ,xn)  where  x(j)  = 

(xj, . . . ,  xt_i,Xi+i, . . .  ,  xn).  We  then  define  a  selection  rule  dn  =  (din, . . .  ,dnn )  as  follows: 

din(xilx(i))  =  din{x )  =  {  1  if  <Pin(xi\x(l))  >  0Q,  (2.5) 

tO  otherwise. 

The  associated  Bayes  risk  of  the  selection  rule  dn  is: 

n 

r(G,dn)  =  J2ri(G,dtn)  (2.6) 

«=i 

where 

ri(G,din)  =  Ei^J  [Oo  -  <Pi[xi)\din(xi\X(i))f(xi)dxi  +  C  (2.7) 

where  the  expectation  E{  is  taken  with  respect  to  X(i).  Recall  that  r,(G,  d{B)  is  the 
minimum  Bayes  risk  for  the  *-th  component  problem.  Thus,  r,(G,  di„)  —  r^G.d.fi)  >  0 
and  therefore,  r(G,dn)  —  r(G)  >  0.  For  the  empirical  Bayes  selection  rule  dn  to  be  useful, 
we  always  desire  that  the  average  nonnegative  difference  (r(G,  dn)  -  r(G))/n  or  the  total 
nonnegative  difference  r[G,dn )  -  r(G)  be  small. 

Definition  2.1 

(a)  A  decision  rule  dn  is  said  to  be  weakly  asymptotically  optimal  relative  to  the  (un¬ 
known)  prior  G  if  (r(G,dn)  —  r(G))/n  — >  0  as  n  — ►  oo. 

(b)  A  decision  rule  dn  is  said  to  be  strongly  asymptotically  optimal  relative  to  the  (un¬ 
known)  prior  G  if  r(G,dn)  —  r(G)  — >  0  as  n  — ►  oo. 

Clearly,  for  a  selection  rule  dn,  the  strong  asymptotic  optimality  implies  the  weak 
asymptotic  optimality.  The  weak  asymptotic  optimality  of  compound  decision  rules  has 
been  studied  in  the  literature  by  many  authors,  notably  Vardeman  (1978,1980),  Gilliland 
and  Hannan  (1986),  and  Gilliland,  Hannan  and  Huang  (1976),  though  the  formulation  of 
their  compound  decision  problems  are  different  from  the  one  we  consider  here.  However, 
very  surprisingly,  it  seems  that  the  strong  asymptotic  optimality  has  not  been  investigated 
so  far.  In  the  following,  we  consider  the  problem  of  selecting  good  Poisson  populations,  and 
use  this  as  an  example  to  illustrate  how  to  incorporate  information  from  different  sources 
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for  making  decisions.  Selection  rules  are  constructed  according  to  how  much  we  know 
about  the  prior  distribution  G.  The  strong  asymptotic  optimality  of  the  selection  rules  is 
investigated.  The  associated  convergence  rates  of  selection  rules  are  also  established. 


3.  Selecting  Good  Poisson  Populations 

It  is  assumed  that  for  each  i  —  l,...,n,  the  random  observation  X,  arises  from 

a  Poisson  population  with  mean  0,.  That  is,  /(xi|0,)  =  j!),  Xi  =  0,1,2, _ 

Then,  f{xi)  =  /0°°  e~e0Xi /(x\)dG(8)  =  a(xt)h(xt),  where  a(xt)  =  1  /x{\  and  h(xi)  = 
e~e8XidG(0),  and  =  h(xi  +  l)/h(x{)  Let  Oq  >  0  be  the  known  standard 

level.  The  Bayes  rule  ds  =  {d\B,  •  •  • ,  dn&)  for  this  problem  is: 


d,B 


1 

0 


if  ip(ii)  >  d0, 
otherwise. 


Since  the  prior  distribution  G  is  unknown,  it  is  not  possible  to  apply  the  Bayes  rule 
ds  here.  Therefore,  in  the  following,  empirical  Bayes  rules  are  constructed  according  to 
how  much  information  we  have  about  the  prior  distribution  G. 

3.1.  A  Nonparametric  Empirical  Baves  Rule 

First,  it  is  assumed  that  the  prior  distribution  G  is  completely  unknown.  Thus,  the 
nonparametric  empirical  Bayes  approach  is  employed.  Note  that  the  Bayes  rule  ds  is 
a  monotone  rule.  That  is,  for  each  i  =  1, . . .  ,n,dis(z)  is  nondecreasing  in  x,  when  all 
the  other  variables  are  kept  fixed.  This  follows  from  the  increasing  property  of  <p,(: r,) 
which  can  be  verified  by  noting  that  /(x(0,)  has  the  monotone  likelihood  ratio.  Thus,  it  is 
desirable  that  the  considered  empirical  Bayes  rules  be  monotone. 

For  each  *  =  1, . . . ,  n,  let  N,n  =  max  Xy  -  1.  For  each  x,  =  0, 1, . . . ,  iV,n  +  1,  let 

;Y» 

fin(xi)  =  ~  ~  (3.1) 

n~l 

hin{xi)  =  fin(xi)/aixi )•  (3 
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Since  it  is  possible  that  h.in(xi)  may  be  equal  to  0,  we  define 


—  [htnl-Et  1)  ~b  <^n] /[h,'n  (xt)  "b  , 


(3.3) 


where  Sn  >  0  is  such  that  6n  =  o(l). 

It  is  intuitive  to  use  <Pm(x»)  33  an  estimator  of  £>,(x,)  and  one  may  obtain  an  em¬ 
pirical  Bayes  rule  as  follows:  Select  tt,-  as  a  good  population  if  <Pin(xi)  >  0o ,  and  ex¬ 
clude  7Tj  as  a  bad  one  otherwise.  However,  this  selection  rule  is  not  monotone  since 
<fiin{xi)  may  not  possess  the  increasing  property.  Thus,  we  consider  a  smoothed  version 
of  <Pin{xi),  Let  {^,*n(x‘)}^=o  be  ^e  isotonic  regression  of  {<£>m(x»)}^=o  with  random 
weights  {W,n(xt)}£\20,  where  Win{xi)  =  [h,n(xi)  +  (5„]a(x,  +  1).  For  y  >  Nin ,  define 
<Pin(y)  =  <Pin(^in)-  Therefore,  <Pin(xi)  is  nondecreasing  in  x,,  x,  =  0,1,2,....  We  use 
<p*n(xt)  to  estimate  <Pi(ii)  and  propose  an  empirical  Bayes  rule  d*  =  (djn, . . . ,  d*n)  as 
follows:  For  each  t  =  1, . . . ,  n, 


<C(X*lX(0) 


if  <Pin(xi)  >  00, 

otherwise. 


(3.4) 


The  performance  of  the  preceding  nonparametric  empirical  Bayes  procedure  will  be 
discussed  in  Section  4. 

3.2.  A  Parametric  Empirical  Baves  Rule 

Here  we  assume  that  the  prior  distribution  G  is  a  member  of  gamma  distribution 
family  with  unknown  shape  and  scale  parameters  k  and  0,  respectively.  That  is,  G  has  a 
density  function  g{Q\k,0),  where 

g{6\k,0)  =0kdk-1e-ee/T{k)y  0  >  0. 

Then,  X\,. . . , Xn  are  iid  with  marginal  probability  function  f(x)  =  T(x  +  k)/?*/[r(k)(l  + 
0)x+kxl],  x  =  0,1,2, —  Also,  <Pi{x)  =  (x  +  k)/[  1  +  0).  A  straight  computation  yields 
Ail  =  £[X,]  =  k/0,  M  =  E[X2}  =  (k  +  l)k/02  -(-  k/0.  Thus,  0  =  Mi/(M2  -  Mi  -  Mi)  and 
k  =  Mi/(M2  -  Mi  -  Mi)-  Therefore,  <pi{x)  =  [x(ai2  -  Mi  -  Mi)  +  Mi]/ (M2  -  Mi)- 
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For  each  t  =  let  mi „(*)  =  E  X:  and  M2n(0  =  E  x]-  That  is, 

j= i  J=i 

jVi 

/iln(i)  and  M2n(0  are  moment  estimators  of  Aii  and  //2i  respectively,  based  on  X(i).  Note 
that  it  is  possible  that  M2n(0  -  m i„(i)  -  Min(»)  <  0  though  M2  —  Mi  —  Mi  >  0.  Now,  for 
each  i  =  1, . . . ,  n  and  X{  =  0, 1, 2, . . .,  define 


<Pin{xi) 


*.  [/*2n  (0  ~/*l  n  (»)-**?,,  (01 +M?n(0 
M2  »(*)-**?„(*) 


if  M2n(»)  -  Mln(0  -  Min  (0  >  0 
otherwise. 


(3.5) 


We  then  propose  an  empirical  Bayes  rule 


in 


{din,...,dnn)  as  follows: 


<ftn(x,|x(t))  =  din(x)  =  |  J 


if  <Pin{Xi)  >  00, 

otherwise. 


(3.6) 


3.3.  A  Hierarchical  Empirical  Baves  Rule 


Now,  it  is  assumed  that  the  prior  distribution  G  is  a  gamma  distribution  with  a  known 
shape  parameter  k  and  an  unknown  scale  parameter  0.  In  this  situation,  the  preceding 
parametric  empirical  Bayes  approach  can  be  applied  here.  However,  since  our  purpose  is 
to  introduce  the  methods  to  incorporate  data  from  different  sources,  a  new  method,  called 
as  hierarchical  empirical  Bayes,  is  used  in  the  following. 

Since  (3  is  a  scale  parameter,  we  assume  that  0  has  an  improper  prior  h(/3)  =  0  > 

0.  Thus,  conditional  on  0,  Xi,...,Xn  are  iid  with  the  probability  function  f(x \0)  = 
f(x\O)g(6\k,0)d0  =  1  x  =  0,1,2,....  Therefore,  (Xi,...,Xn)  has  a  joint 

marginal  probability  function  [(xi,...  ,xn)  where 


/<** . *•>  -  p^y1]  l 


}nk  —  1 


(i  +  0) 


-d0,  where 


b  —  nk+  E  Xj.  Thus,  the  posterior  density  function  of given  (Xi, ...,  Xn)  =  (xj,. 
i=i 

f{Xl\0)...f{xn\0)h(0) 

/(xi, • •  •  >  xn) 

jnk  —  1  r  r°°  link- 1  ^  - 1 

;d0 


•  ,  xn) 


h((3  |zi,...,xn)  = 


\L 


{1  +  0)0  [Jo  {1  +  0Y 
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and  the  posterior  mean  of  j3  given  (Xi , . . . ,  Xn)  =  (zi , . . . ,  zn)  is 


E*>-1  j=l 

Pn  =  E[P |zi,...,Zn]  =  {  y=i 


nk 


if  E  xj  >  2. 


oo 


if  E  xi  <  1- 

;  =  i 


Now,  for  each  *  =  1, . . . ,  n,  and  z,  =  0, 1,2, . . .,  define 


f>in{xi) 


i 


(z,  +  /c)/(l  +  pn)  if  E  Zj  —  2, 

3  =  1 

0  if  E  x3  <  1- 

;  =  i 


We  then  give  an  empirical  Bayes  rule  dn  =  (din, . . . ,  dnn)  as  follows: 

3i.(*j|*(0)=3i.(*)  =  {J 


(3.7) 


(3.8) 


4.  Asymptotic  Optimality  of  the  Proposed  Empirical  Baves  Rules 


In  this  section,  we  investigate  the  asymptotic  optimality  of  the  proposed  empirical 
Bayes  rules. 

Let  A(0q)  =  {z|<p(z)  >  Oo}  and  B(0 o)  =  {x\<p(x)  <  0q}.  Define 


M  = 


m  = 

where  4>  denotes  the  empty  set. 


f  miaA(^o) 
f  max£(0o) 

l-l 


if  A(0o)  /  <t>, 
otherwise, 

if  B(0 0)  #  4>, 
otherwise, 


(4.1) 

(4.2) 


By  the  increasing  property  of  <p(x)  in  the  variable  x,  m  <  M\  also  m  <  M  if  A{60)  4>. 

Furthermore,  z  <  m  iff  <p(x)  <  0O  and  y  >  M  iff  <p{y)  >  Oo-  In  the  following,  we 
consider  only  those  priors  G  such  that  /0°°  0dG(O)  <  oo  and  m  <  oo.  Note  that  the 
preceding  requirements  are  always  met  if  the  prior  distribution  G  is  a  member  of  gamma 
distribution  family.  Let  dn  =  ( d\n , . . .  ,dnn)  be  any  of  the  three  proposed  empirical  Bayes 
rules  and  let  (v?in(zi)>  •  •  •  ,  V?nn(®n))  t>e  the  corresponding  empirical  Bayes  estimators.  By 
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the  definitions  of  <P,*n(z»)>  <p,n(xj)  and  <pin(xi),  £>tn(x,)  is  increasing  in  x,  when  all  the 
other  variables  xy,  j  t,  are  kept  fixed.  Thus,  for  each  i  =  1, . . .  ,n, 


0  <  ri(G,din)  -  ri(G,diB) 

m  oo 

=  ltf°  ~  <P(xi)\P{<Pin(Xi)  >  60}f(Xi)  +  ^  [^(Z‘)  “  0O\P{(fiin{Xi)  <  60}f{Xi) 


Zi=0 

m 


oo 


<  ^2  [*°  “  ¥?(*t)]P{Pm("*)  >  M/(x*)  +  ^2  [^(X*)  _  0O]P{ipin(M)  <  e0}f{xi) 

z,=  0  z,  —hi 

=  6iP{v3tn(m)  >  0O}  +  biP{ipin(M)  <  0o}-  (4.3) 

m 

In  (4.3),  the  probability  measure  P  is  computed  with  respect  to  X(i).  Also,  0  <  by  =  E 

x-0 

OO 

[*o  -  <p{x)]f{x)  <  oo,  0  <  &2  =  E  [^?(x)  —  6o]f(x)  <  oo.  The  finiteness  of  both  6j  and 

x  =  M 

f>2  is  guaranteed  by  the  assumption  that  fQ  OdQ(0)  <  00. 

From  (4.3),  we  obtain: 

0  <  r(G,dn)  —  r{G) 

n 

=  ^[r,(G,dtn)-rt(G,d<B)] 


1—1 

n 


(4.4) 


<  ^[6iP{^in(m)  >  90}  +  biP{ipin{M)  <  Ml- 


1=1 


Therefore,  it  suffices  to  consider  the  asymptotic  behavior  of  P{<Pin(m )  >  60} 
and  P{<pin(M)  <  0O}. 

4.1  Asymptotic  Optimality  of  d* 

We  first  present  some  useful  results. 

For  each  i  -  ,  . .  ,n  and  y  =  0, 1, . . . , N{n,  let  'J'm(y)  =  E  <Pin{x)Win(x),  V*n{y)  = 

x—0 

v  v 

E  lPin(x)win{ *)  and  7,n(y)=  £**«*(*)  where  Win[x ),  x  =  0,1,...,JV,„,  are  the  ran- 

x=0  z=0 

dom  weights  affined  m  Section  3.  From  Barlow,  et  al.  (1972), 


*in{y)  <  'i'm(y)  for  all  y  =  0, 1, . . . ,  Nin. 


(4.5) 
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From  Puri  and  Singh  (1988),  the  isotonic  regression  estimators  <p*n{x),  x  =  0, 1, . . . ,  jVtn, 
can  be  rewritten  as: 


Vini*)  =  I™ 


x<y<Nin  [ Hin{y )  ~  Hin(x  -  l)  J 
where  '&l*n(  — 1)  =  Hin{  —  1)  =  0.  Thus,  from  (4.5)  and  (4.6), 

'tt.n(y)  -  *in(a  -  1) 


X  =  o  1  AT 


(4.6) 


-  «<“!«..  L -  //.„(*  -  1) 


,  x  —  0,1,..., N{ n , 


(4.7) 


where  \&,n(  — 1)  =  0. 

The  following  Lemma  is  taken  from  Liang  (1989). 

T.emma  4.1.  Let  {am}  be  a  sequence  of  real  numbers  and  let  {6m}  be  a  sequence  of  positive 
numbers  such  that  bm  <  1  and  bm  is  nonincreasing  in  m.  Then,  for  each  positive  constant 

c, 


sup 

n>  1 


^  ^  Qmbr 


m—1 


>  (>)c  =>•  sup 

n>  1 


X)  °« 

m=  1 


>  (>)e. 


y  y 

Lemma  4.2.  Define  a  function  Q(y)  =  0O  J2  f{x)  ~  £  /(x-f  1)  on  the  set  {y|y  = 


*=M  ^  x=M 


M,M  +  1,...}.  Then,  Q(y)  is  a  decreasing  function  of  y.  Hence  max<2(y)  =  Q{M )  = 

y>Af 

<  0- 

Proof:  1 2(s  +  1)  -  Q(y)  =  !{y  +  [»o  -  <p(y  +  1)1  <  0  since  y  +  1  >  M  and  thus 

<p{y  + 1)  >  <p{M)  >  6q.  Thus,  Q(y)  is  a  decreasing  function  of  y  which  leads  to  the  result 
of  this  lemma. 


Theorem  4.3.  P{<p*n[M )  <  0o}  <  0(exp(— Tin)) 

where  T\  =  min(2(Q(M)  max(l,0o  1)/8)2 ,  ln[F’(M)]-1)  >  0. 

Proof:  P{<P*n(M)  <  60) 


=  P{<p:n(M )  <  e o ,  Nin  <M}  +  P{<Pin(M)  <  00,  Nin  >  M }.  (4.8) 

Now, 

P{vU{M)  <  *0.  «t.  <M)<  [F(M)]"-‘  =  0(exp(-nln[F(M)|-‘)),  (4.9) 
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where  F(-)  is  the  marginal  distribution  of  Xi,  and  the  inequality  is  obtained  by  the  defi¬ 
nition  of  Ntn. 

Also,  from  (3.1)-(3.3),  (4.7),  Lemma  4.2,  and  by  the  definitions  of  ^^(y)  and  Hin(y), 
straightforward  computation  yields  the  following: 


E  =  {<p’n(M)  <  $o,  Nin  >  M } 

C  {*iB(y)  -  ^,n(M  -  1)  <  0o[Hin[y)  -  Hin{M  -  1)]  for  some  y,  M  <y  <  jV,n} 

/  y  y  .  .  y 

E  ia.(* + 1)  -  /(x + i)i  -  «0  E  <  (*>  -  »*>  E  -t* + d 

t  r-  K4  v  —  Kf  '  '  t.~  K4 


y 

L 

,x  =  M 

+  Q(M)  for  some  y  >  M 


=  Ex 


(4.10) 


oo 

Since  a(x)  >  0  for  all  x  =  0, 1, . . . ,  aix)  <  00  and  &n  =  o(l),  then,  for  sufficiently 

x=0 

large  n,  ( 90  —  l)6n  J2  a(x  +  l)  +  Q{ M)  <  Q(M)/2  <  0  for  all  y  >  M.  Note  that 

i 

a(x  +  1  )/a(x)  =  (x  +  l)-1,  which  is  positive,  bounded  above  by  1,  and  decreasing  in  x  for 
x  =  0, 1, 2, _  By  the  preceding  facts  and  Lemma  4.1,  we  obtain: 


«.cU 

y>M 


c  u 

y>M 


>-9iMlOI 

4 


x=M 

y 


!/»•»(*  +  *)  ~  f(x  +  !)! 

t—M 

J2  [/«■»(* +i)  ~f(x+i)\ 

_  e=Af 

| sup  |F,n(y)  -F(y)|  >  ~Q{M)  max(Mo 1)/8) 

U>o  J 

where  Fin{y )  is  the  empirical  distribution  based  on  ■X’(i). 


t  i/*m  -  m 


> 


Ml 

4^o  J 


>  -^7^-  or  £  [/»(*)  -  /(*)] 

*  x=M 


.  _m( 

4*0  J 


(4.11) 


From  (4.10)  and  (4.11),  we  obtain 

P{Vi»{M)  <  0O,  Nin  >  M} 

<  P{sup|F,n(y)  -  F(y)|  >  -Q(M)  max(Mo l)/8}  (4.12) 

v>0 

<  dexp{— 2n(Q(M)  max(l,0Q  1)/8)2} 

where  the  last  inequality  follows  from  Lemma  2.1  of  Schuster  (1969). 
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Now,  let  T\  =  min(2(<?(M)  max(l,0o  1  )/8)2 4  ln[F(M)]  1).  Clearly  T\  >  0.  Combining 
(4.8),  (4.9)  and  (4.12)  gives  the  result  of  this  theorem. 

Theorem  4.4.  P{<£>,‘n(m)  >  0o}  <  0(exp(-r2n)) 

where  r2  =  [R* (m)  min(l,<?o  1)]2/8  >  0  and  R*(m)  is  defined  below. 

Proof:  From  (3.1)-(3.3)  and  by  the  definition  of  <pfn(m ), 

{P,’n(m)  >  M 

c{<pin(x)  >  9q  for  some  0  <  x  <  m}  (4-13) 

c{a(x)  A,n(x  +  1)  -  60a(x  -4-  1)  A,n(x)  >  R(x)  —  a(x)a(x  +  l)6n[l  —  Oo]  for  some  0  <  x  <  m}, 

where  Ai„(x)  =  /in(x)-/(x),  R(x)  =  -a(x)/(x+l)+0oa(x+l)/(x)  =  a(x+l)f(x)[-ip(x)-h 
0O]  >  0  since  60  —  <£>(x)  >  0o  —  V^(m)  >  0,  by  the  definition  of  m  and  the  fact  that 
0  <  x  <  m.  Thus,  R*(m)  =  min  R{x)  >  0  and  therefore,  for  sufficiently  large  n, 

0<x<m 

R(x)  —  a(x)a(x  +  l)6n[l  —  0o]  >  R*(m)/2  since  6n  —  o(l).  Therefore,  from  (4.13)  and  by 
Theorem  1  of  Hoeffding  (1963), 

P{f>: n(m)  >  «o} 

m 

<^(P{Atn(x  +  1)  >  F*(m)/(4o(x))}  +  P{A,n(x)  <  -R*(m)/{40oa{x  +  1))}] 

z= 0 
m 

<  ]P[cexp{-2n[F*(m)/(4a(x))]2}  +  cexp{-2n[F*(m)/(40oa(z  +  l))]2}] 

2=0 

=0(exp(-r2n)). 

Based  on  the  preceding  discussions,  we  have  the  following  result. 

Theorem  4.5.  Assume  that  the  prior  distribution  G  is  such  that  /0°°  0dG(6)  <  oo  and 
m  <  oo.  Then,  for  the  empirical  Bayes  rule  d* ,  0  <  r(G,d*)  —  r(G)  <  0(exp(— rn  +  Inn)) 
where  r  =  min(ri,r2)  >  0. 

Proof:  By  (4.4),  Theorem  4.3  and  Theorem  4.4,  we  have 

0  <  r(G,d*)  -  r(G )  <  0(nexp(-rn)) 

=  0(exp(— rn  +  Inn)). 
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A 

4.2.  Asymptotic  Optimality  of  dn 

We  let  Mi(t)  and  M2(t)  denote  the  moment  generating  functions  of  Xi  and  Xf, 
respectively.  For  each  real  value  a,  define 

mi(a)  =  inf  e~atMi(t) 
m2(a)  =  inf  e~atM2(t) 

where  the  infimum  is  taken  with  respect  to  real  values  of  t. 

Lemma  4.6.  For  any  positive  constant  c, 

0  <  mi{m  +  c)  <  1,  0  <  m,(/x,  -  c)  <  1  for  t  =  1,2, 

where  Hi  =  E[Xi]  and  H2  =  E[Xi\. 

Proof:  For  the  fixed  real  value  a,  consider  the  function 

Si(t)  =  e~atM{t )  =  £[«*<*»-*>]. 

We  have 

Sx(1)(t)  =  £[(Xi  -a)et(x»-a)], 

5j2)(t)  =E\{Xl  -o)2e‘(Xl-a)], 

where  denotes  the  j-th  derivative  of  Si[t)  with  respect  to  t. 

Since  >  0  for  all  t,  5i(t)  is  a  convex  function.  Also,  5^(0)  =  E[X\  —  a]  < 

(=,>)0  iff  n\  <  (=,>)a.  Thus,  as  n\  <  o,  Sj^O)  <  0,  which  implies  that  S\(t)  is 
strictly  decreasing  in  a  neighborhood  of  point  zero.  Also,  Si(0)  =  1.  Therefore,  mi  (a)  <  1 
if  Hi  <  a-  Similarly,  we  can  also  obtain  the  following  result:  m^a)  <  1  if  hi  >  a. 
Now,  by  the  definition,  mi  (a)  >  0.  These  results  yields  that  0  <  m\{Hi  +  c)  <  1  and 
0  <  m\{H\  —  c)  <  1  for  any  positive  constant  c. 

The  results  that  0  <  m2(H2  +c)  <  1  and  0  <  m2(H2  —  c)  <  1  for  any  positive  constant 
c  follow  from  similar  arguments. 

Lemma  4.7.  For  each  *  =  1,. . .  ,n,  let  Hin(i)  and  H2n(i)  be  the  moment  estimators  of  hi 
and  Hh  respectively,  which  are  defined  in  Section  3.  Then,  for  any  positive  constant  c, 
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(a)  P{Mm(»)  -  Mi  <  -c}  <  -  c)]n_1, 

(b)  P{mi»(«)  ~  Mi  >  c}  <  [mi(ni  +  c)]"-1, 

(c)  P{M2n(» )  -  M2  <  ~c}  <  [m2{n 2  -  c)]""1  and 

(d)  P{/i2n(0  -  M2  >  4  <  [m2(M2  +  c)]n_1. 


Proof:  This  lemma  is  a  direct  application  of  Chernoff  (1952).  The  proof  can  be  completed 
by  noting  the  fact  that  0  <  E[X i]  <  oo  and  0  <  <  oo. 


Let  M  =  M2  —  Mi  —  Mi-  Thus,  p  >  0,  see  Section  3.  Define  A  —  max(m2(M2  —  f), 
mi(Mi  +  f ),  rni(pi  +  g^),  mi(2Mi)).  By  Lemma  4.6,  0  <  A  <  1. 

Lemma  4.8.  P{p2n{i)  -Min(*)  -  Min(0  <  °)  <  0(exp(-ain)) 

,  /  —  In  A  if  A  >  0, 

where  cti  =  <  ’ 

l  oo  if  A  =  0. 

Proof:  P{M2n(0  -  Min(r)  -  Min(0  <  0} 


=  P{[M2n(0  -  Mln(t)  -  M?„(*)]  -  [M2  -  Ml  ~  Ml]  <  “M> 
<  P  {m2».(0  -  M2  <  -^}  +  P  {Mln(t')  ~Ml  >  ^} 

+  ^{mi„(0  -  Mi  >  |}  • 

By  Lemma  4.7, 


P  {M2n(0  -M2  <  -|}  <  [m2  (m2  -  |)] 

P  {Min(0  -  Mi  >  <  [mi  (mi  +  |)]  7  and 

p{m?,(0-m?>§} 

=  P  {m?„(*)  -  M?  >  |,  Min (*')  <  2mi  }  +  P  {#*?„(«)  -  M?  >  Mln(t)  >  2mi  } 

<  p  |Mln(»)  -  Ml  >  +  P{Mln(0  -  Ml  >  Ml} 

<  [”»1  (m!  +  “•)]  +  [mi(2Mi)]n_1. 


(4.14) 


Combining  the  preceding  results,  the  lemma  follows. 

Theorem  4.9.  P{fiin(M)  <  0o}  <  0(exp(-a2n))  for  some  positive  constant  a2. 
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Proof:  P{<pin{M)  <  0O}  =  P{<pin{M )  <  0O,  /i2n(»)  -  Mln(0  -  Min(0  <  °} 


where 


+  P{<Pin(M)  <  0 0 ,  Hi „(»')  -  Hm(t')  -  Mini*)  >  °}> 

P{<Pin{M)  <  0O,  M2n(t)  -  Mln(0  ~  M?„(0  <  0} 

<  0(exp(-ajn))  by  Lemma  4.8. 


(4.15) 


(4.16) 


Now,  let  q(M)  =  M(hi  —  Hi  —  Hi)  +  Hi  —  0o(M2  -  Mi)-  By  definition  of  M,  q(M )  >  0. 
Thus, 

P{<pin{M)  <  00,  H2n(i)  -  Hm(i)  -  Min(0  >  °) 

<P{(M  -  0o)H2n(i)  -  MHin(i)  ~[M  -  1  -  0o)M?n(O  <  °) 

=P{(M  -  «o)(|*2n(0  -  M2)  -  M(/im(l)  -  Mi)  -  (Af  -  1  -  *o)(Mi»(0  ~  M?)  <  -g(Af)} 
<P  {(M  -  to) Mi)  -  Ml)  <  -^}  +  P  -  Ml)  >  (4.17) 


+  P  j(M  -  1  -  «o)(m?„(0  -  Mi)  > 


q(M) 


By  Lemma  4.7, 


p|M(/iln(t)  -  hi)  >  <  [™i  (a 


?(M) 


(4.18) 


f  9(^)1  f[m2(M2  Z{M—8o) )  ]  lfAf  60  >0’ 

P  MM -0O)(M2«(*)- M2)  <-“■[<  {0  _  if  M  —  00  =  0, 

Ih^  +  fd^))]"'1  ifM-0o<O, 

(4.19) 


and  analogous  to  (4.14), 


p|(M-l-0o)(M?n(O-M?)  > 


?(M) 


+  +[mi(2Mi)]"-1  if  M  —  1  —  0O  >  0,  (4.20) 

<  <  0  if  M  -  1  -  0O  =  0, 


[”»i  (a 


6  (  Af  —  1  —  0q  )  /i  1 


if  M  -  1  -  0O  <  0. 


Combining  (4.15)-(4.20),  and  by  Lemma  4.6,  it  follows  that  there  exists  a  positive 
constant,  say  a2,  such  that  P{£jn(M)  <  0o)  <  0(exp(-a2n)). 
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Theorem  4.10.  P{<£>in(m)  >  0O}  <  0(exp{-a3fi))  for  some  positive  constant  0:3. 

Proof:  The  proof  is  analogous  to  that  of  Theorem  4.9.  We  omit  the  detail  here. 

The  following  theorem  is  a  direct  result  of  (4.4)  and  Theorems  4.9  and  4.10. 

Theorem  4.11.  Let  dn  be  the  empirical  Bayes  rule  defined  in  Section  3.  Assume  that  the 
prior  distribution  G  is  a  member  of  the  gamma  distribution  family.  Then, 

0  <  r(G,  dn)  —  r(G)  <  0(exp(-cm  +  Inn)), 

where  a  =  min(a2,Q!3)  >  0. 

4.3.  Asymptotic  Optimality  of  dn. 

Theorem  4.12.  Let  dn  be  the  empirical  Bayes  rule  defined  in  Section  3.  Assume  that  the 
prior  distribution  G  is  a  member  of  gamma  distribution  family  r(A;,/?),  where  k  is  a  known 
positive  constant.  Then, 

0  <  r(G,  dn)  —  r(G)  <  0(exp(— jn  +  In  n)) 

for  some  positive  constant  7. 

Note  that  the  statistical  model  considered  here  is  simpler  than  that  of  Section  4.2. 
Thus,  the  proof  for  Theorem  4.12  is  analogous  to  and  simpler  than  that  for  Theorem  4.11. 
We  omit  the  detail  here. 
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